Skip to content

Changed min/max workers to single field #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 4, 2023

Conversation

Maxusmusti
Copy link
Collaborator

@Maxusmusti Maxusmusti commented Aug 1, 2023

Issue link

#232

What changes have been made

Replaced min_worker and max_worker with a single consistent worker field in cluster config

Verification steps

Create a cluster
check cluster.details()
check the actual cluster details

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@Maxusmusti
Copy link
Collaborator Author

One important note, I've noticed a couple of other things while making these changes:

  • I named the config field worker, to be consistent with gpu. Should we update these to workers and gpus for clarity, to show we are specifying an amount? Or is it fine as is?
  • I noticed that cluster.details() and our internal RayCluster object only show/hold one value for cpu, even though we actually have a min_cpu and max_cpu. Should these be updated to also show/hold both the min and the max properly? (aka the opposite problem of what we had with this PR lol)

@tedhtchang
Copy link
Contributor

It works try the following

cluster = Cluster(ClusterConfiguration(local_interactive=local_interactive, namespace=namespace, name=cluster_name, worker=1, min_cpus=1, max_cpus=1, min_memory=4, max_memory=4, gpu=0, instascale=False, machine_types=["m5.xlarge", "p3.8xlarge"]))

image

@jbusche
Copy link
Contributor

jbusche commented Aug 1, 2023

Looks good Mustafa, tried it with:

cluster = Cluster(ClusterConfiguration(
    name='mnisttest',
    namespace='default',
    worker=2,
    min_cpus=2,
    max_cpus=2,
    min_memory=2,
    max_memory=2,
    gpu=0,
    image="quay.io/project-codeflare/ray:2.5.0-py38-cu116",
    instascale=False, # Can be set to false if scaling not needed
    machine_types=["m5.xlarge", "g4dn.xlarge"] # Can be removed if above is false
))

Copy link
Contributor

@tedhtchang tedhtchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to update these notebooks
demo-notebooks/guided-demos/notebook-ex-outputs/0_basic_ray.ipynb
demo-notebooks/guided-demos/notebook-ex-outputs/3_basic_interactive.ipynb
demo-notebooks/guided-demos/notebook-ex-outputs/2_basic_jobs.ipynb
demo-notebooks/guided-demos/notebook-ex-outputs/4_gpt.ipynb

@MichaelClifford
Copy link
Contributor

I named the config field worker, to be consistent with gpu. Should we update these to workers and gpus for clarity, to show we are specifying an amount? Or is it fine as is?

I personally like num_workers and num_gpus if you are going to make a change. But definitely not soemthing worth blocking the PR on.

Copy link
Contributor

@tedhtchang tedhtchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm keeping the old notebooks is a great idea.

@jbusche
Copy link
Contributor

jbusche commented Aug 4, 2023

/LGTM Thanks Mustafa, I was able to run on my cluster. Still not quite certain how it all gets folded in so the num_workers and num_gpus begin to work with the codeflare_notebook, but LGTM in testing.

Copy link
Contributor

@jbusche jbusche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested OK on my OC 4.12 cluster

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 4, 2023
Copy link
Contributor

@MichaelClifford MichaelClifford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Maxusmusti changes look good! 🚀

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 4, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jbusche, MichaelClifford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2023
@Maxusmusti Maxusmusti merged commit a285ef6 into project-codeflare:main Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants